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Abstract: 

Complex  Event  Processing,  or  CEP,  is  an  event  processing  technique  that  analyzes  multiple 
events  with  the  goal  of  identifying  meaningful  complex  events  within  an  event  cloud.  CEP 
employs  techniques  such  as  detection  of  complex  patterns  of  many  events,  event  correlation  and 
abstraction,  computable  event  hierarchies,  and  event  relationships  such  as  causality,  membership, 
timing,  and  event  sequences.  In  this  paper  we  present  a  detailed  analysis  of  the  characteristics  of 
Multi-INT  data  streams  from  an  Expeditionary  and  Irregular  Warfare  (EIW)  environment.  After 
evaluating  these  characteristics  we  propose  a  solution  using  approximate,  incremental  graph 
pattern  search  algorithms.  Finally,  we  present  a  prototype  implementation  of  these  algorithms 
and  a  preliminary  evaluation  of  their  use  and  performance. 

Keywords:  multi-INT,  data  fusion,  complex  event  processing,  graph  theory,  graph  patterns, 
graph  search,  expeditionary  warfare. 


1.  Introduction. 

A  complex  event  is  a  composite  of  simpler  events.  The  components  of  a  particular  complex 
event  are  frequently  variable  and  spread  across  a  significant  period  of  time.  Complex  Event 
Processing  (CEP)  is  concerned  with  finding  these  complex  events  in  both  large  collections  of 
events  and  event  streams.  Many  intelligence  gathering  systems  produce  high  volume  input  and 
output  streams  of  simple  events  [1],  Many  systems  also  store  events  for  some  period  of  time 
depending  on  user  need  for  history,  information  fusion,  or  other  system  processing.  Warfighters 
demand  that  these  streams  and  collections  be  examined  often  and  in  near  real-time  for  situation 
awareness,  force  protection,  and  force  projection.  To  meet  this  demand,  processing  strategies  and 
algorithms  are  needed  to  automate  detection  of  events  in  clutter. 

The  literature  of  ideas  in  CEP  appear  to  have  had  their  genesis  in  the  active  database  community 
[3,  47],  and  discussion  continues  recently  [41].  Current  CEP  techniques  have  been  widely 


discussed  in  the  business-oriented  data  management  community  for  a  lengthy  period  [9,  19,  20, 
21,  23,  25,  32,  41,  43,  46],  Additional,  application  areas  include  intrusion  detection  [17], 
provenance  and  workflow  management  [40],  and  software  maintenance  [28], 

Our  first  approach  to  detecting  complex  events  in  multi-INT  Expeditionary  and  Irregular 
Warfare  (EIW)  data  streams  was  to  try  adapting  current  business-oriented  CEP  techniques  for 
use  on  multi-INT  data  streams.  In  the  next  subsection  of  this  paper  we  will  present  our  analysis 
difficulties  and  shortcomings  of  this  initial  approach.  In  the  second  section  of  this  paper  we  will 
discuss  an  approach  that  we  took  and  the  expected  benefits.  In  the  third  section  we  present  our 
dynamic  complex  event  processing  algorithms  based  on  graph  pattern  search.  In  the  fourth 
section  we  will  outline  our  prototype  implementation  and  show  an  example  of  its  operation.  In 
the  fifth  and  final  section  we  will  present  the  results  of  a  preliminary  examination  of  the 
performance  of  our  algorithms  compared  to  a  standard  search  algorithm. 


2.  Multi-INT  data  streams. 

Our  initial  examination  of  current  event  processing  techniques  and  implementations  revealed  that 
the  data  operated  on  is  generally  well  defined.  For  example,  financial  transactions  are  usually 
generated  by  machines  and  while  the  volume  may  be  high,  the  definitions  and  character  of  the 
messages  is  rigidly  defined  and  displays  little  variance.  In  contrast,  military  messages  are 
frequently  generated  by  humans,  and  while  the  format  of  the  messages  may  be  defined,  the 
content  is  often  not  constrained.  Reporting  is  often  subjective,  arbitrarily  delayed,  and 
incomplete.  This  makes  current  open-source  and  commercial  CEP  engines  [12,  33,  34,  37,  38, 

45]  unsuitable  for  use  in  Expeditionary  and  Irregular  Warfare  (EIW)  environments.  This 
presented  a  challenge  that  we  undertook  by  characterization  of  event  streams,  generalized  search 
methods  and  use  of  dynamic  algorithms  best  suited  for  EIW  user  needs. 

Since  our  work  is  aligned  with  the  EIW  development  environment  the  first  step  we  undertook 
was  an  informal  survey  of  related  projects  to  determine  data  storage  facilities  and  formats.  We 
found  that  graph-based  notations  are  dominant  in  EIW  Science  and  Technology  (S&T)  projects. 
Projects  are  increasingly  sharing  results  of  data  analysis  in  Resource  Description  Format  (RDF) 
[30],  a  recently  released  World  Wide  Web  Consortium  (W3C)  [44]  standard  for  flexible 
knowledge  markup.  Figure  1  show  a  typical  workflow  of  several  US  Navy  development  projects 
we  reviewed.  RDF  is  an  explicitly  specified  directed  graph  format  where  nodes  represent  known 
entities  and  directed  edges  represent  a  defined  relationship  between  the  source  and  sink  nodes. 
The  use  of  RDF  enables  data  and  service  sharing  but  for  extended  use  requires  definitions  of 
entities  and  relationships  and  this  is  still  in  flux.  Despite  this  uncertainty  we  feel  that  graphical 
notations  will  dominate  future  data  storage  and  processing  environments  in  EIW. 
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Figure  1.  The  Expeditionary  and  Irregular  Warfare  (EIW)  data  environment  requires  multi-INT 
data  sources.  The  S&T  community  is  increasingly  using  RDF  for  data  storage. 


The  essential  function  in  any  event  processing  system  is  some  method  of  pattern  finding.  The 
RDF  standard  has  an  associated  query  language  named  SPARQL  [36],  SPARQL  on  an  RDF 
store  serves  a  purpose  similar  to  SQL  on  a  relational  database  store.  One  problem  in  using 
SPARQL  to  find  complex  events  is  that  the  presentation  of  events  is  variable.  Significant  event 
patterns  can  be  buried  in  large  amounts  of  data  over  extended  time  periods  and  thus  not  evident 
or  require  multiple  queries.  Additionally,  the  RDF  store  undergoes  considerable  change  and  may 
contain  gaps  in  information  or  out  of  order  entry  of  information.  Due  to  these  factors,  the  results 
of  a  query  can  be  non-deterministic.  Since  the  essential  format  of  RDF  information  is  a  graphical 
network  we  conducted  a  survey  of  network  algorithms  available  in  the  literature. 

For  our  survey  we  divided  the  network  literature  into  three  distinct  regimes:  static  networks  [35], 
dynamic  networks  [4,  11,  14,  29],  and  event  sequences  [2,  5,  7,  8,  15,  16,  24,  26,  47].  Figure  2 
shows  the  general  characteristics  of  each  of  these  literature  regimes.  In  general  we  find  that  EIW 
data  streams  contain  very  little  static  information.  Static  information  is  limited  to  physical 
geography  and  major  structures.  While  social  networks  are  usually  present,  the  relations  are  not 
predominately  concurrent  and  relations  and  dependencies  across  the  data  are  complex.  Similarly, 
while  much  of  the  data  consists  of  some  approximate  sequence,  sequence  may  be  only  roughly 
decidable.  Likewise,  persistence  and  concurrency  are  variable.  In  contrast,  characterizations  of 
dynamic  network  data  fit  the  EIW  data  environment  well  for  reasons  shown  in  the  figure. 


Figure  2.  Three  distinct  network  regimes  have  been  explored  in  the  literature.  CEP  in  EIW  event 
streams  rarely  has  the  characteristics  of  static  network  data. 


3.  Approximate,  incremental  graph  pattern  search. 

The  characteristics  of  C4ISR  data  environments,  and  EIW  in  particular,  suggest  a  more  general 
approach  than  those  previously  reported  in  CEP  literature.  Since  the  information  environment  is 
often  stored  and  visualized  as  a  graph  this  suggests  an  approach  based  on  some  method  of 
approximate  graph  pattern  matching  [6,  10,  13,  29,  39,  49].  However,  the  data  stream  creates  a 
dynamic  environment.  In  this  section  we  will  present  such  an  algorithm  based  on  new  work  [13] 
in  combination  with  previous  incremental  update  techniques  for  dynamic  graphs  [29].  Our 
requirements  are  depicted  graphically  in  Figure  3.  We  wish  to  search  for  approximate  matches 
across  RDF  defined  by  multiple  ontologies  and  potentially  disconnected  graphs. 


Figure  3.  The  graphic  illustrates  how  data  triples  (subject-predicate-object)  can  be  associated 
into  ontologies  (01,  02,  03)  and  these  in  turn  can  be  mapped  into  a  complex  graph. 


Of  particular  interest  to  our  research,  was  a  polynomial  time  algorithm  for  graph  pattern  search 
and  constructs  a  proof  of  its  run-time  [13].  This  suggests  that  their  algorithm  will  scale  well 
enough  to  handle  large  data  sets  encountered  in  many  C4ISR  applications.  However,  since  it 
assumes  a  static  graph,  it  does  not  take  into  account  the  scale  of  change  in  the  underlying  graph 
that  would  be  expected.  Earlier  work  by  Ramalingam  [29]  provides  a  basis  for  incrementally 
updating  data  structures  that  may  allow  tractable  graph  pattern  searches.  Figure  4  shows  pseudo 
code  for  our  graph  pattern  match  algorithm  [29]. 

Graph  Pattern  Match 

Input:  Pattern  P=(Vp,Ep),  Data  Graph  G=(V,E),  and  Ontology  0 

Initial  step:  compute  all-pairs-shortest  path  matrix. 

A.  Find  descended  nodesJ  DVp=desc(Vp),  in  ontology  of  each  pattern 
graph  node. 

B.  Compute  potential  matching  set  in  G  for  each  node  in  DVp. 

C.  Traverse  paths  in  the  matching  set,  examining  path  length  and  edge 
type.  Remove  nodes  that  are  not  connected,  do  not  meet  path 
constraints,  or  do  not  have  correct  edge  type. 

Figure  4.  Pseudocode  for  our  graph  pattern  match  algorithm.  The  distance  matrix  M  is  updated 
separately  using  the  algorithm  in  [29]. 


4.  Prototype  implementation. 

In  this  section  we  describe  out  proposed  system  and  prototype  implementation  of  our  algorithms, 
and  present  an  example  using  data  generated  by  the  US  Marine  Corps.  The  architecture  of  our 
proposed  system  is  shown  in  Figure  5.  For  our  study,  we  assume  to  have  access  to  a  variety  of 
multi-INT  data  streams  in  RDF  format  that  could  be  stored  and  managed  centrally.  The  lifetime 
of  the  data  will  be  expected  to  vary  with  the  capacity  of  the  overall  system  and  the  needs  of  the 
processing  systems  generating  and  consuming  the  data.  A  prototype  was  constructed  to  show  a 
proof-of-concept  for  identifying  complex  events.  The  complex  events  are  built  from  simple 
events  that  can  arrive  through  separate  event  streams.  It  is  necessary  to  combine  the  data 
through  a  common  lexicon  and  ontology.  Python  scripts  were  used  to  simplify  prototype 
implementation. 

The  data  used  for  the  proof-of-concept  was  from  a  Second  Marine  Expeditionary  Force  (IIMEF) 
experiment  that  took  place  a  Camp  Jejune,  Dec  13-15,  2011.  A  use  case  was  constructed  for 
emplacing  an  Improvised  Explosive  Device  (IED).  This  involved  vehicles,  individual  dismounts, 


and  activity  alongside  a  road.  There  was  contextual  information  and  prior  relationships 
established  of  vehicles,  individuals  and  area  that  activity  occurred.  The  data  arrived  in  the  form 
of  Intel  reports  (e.g.  DIIRs)  and  tactical  reports  (e.g.  TACREPS).  This  data  was  tagged, 
associated  and  analyzed.  The  collection  consists  of  35  short  text  reports  prepared  during  the  first 
phase  of  the  exercise,  Intelligence  Preparation  for  the  Battlefield  (IPB).  The  IPB  reports  describe 
a  fictional  background  scenario  spanning  several  days  before  a  Marine  squad  undertakes 
movement  in  to  and  then  out  of  a  fictional  Afghan  village.  The  objective  of  our  CEP  system  is  to 
identify  potential  IED-related  activity  in  these  reports. 


Figure  5.  Graphic  diagram  of  processing  method  that  inputs  event  streams,  converts  data  sources 
to  metadata,  analyzes  metadata  with  CEP  algorithms  and  outputs  complex  events. 


We  reduced  the  content  of  the  reports  to  RDF  by  hand.  In  a  working  prototype  this  step  would  be 
automated  in  cooperation  with  other  systems  to  reliably  identify  entities  and  relationships  and 
encode  them.  The  output  of  our  encoding  into  RDF  of  all  IPB  reports.  The  complexity  of  the 
graph  makes  human  interpretation  very  difficult.  In  addition,  we  created  an  informal  ontology, 
for  the  objects  and  relationships  in  the  IPB  reports.  This  ontology  is  used  to  find  the  descendent 
nodes  for  a  pattern  matching  algorithm.  The  hierarchy  of  entities  and  relationships  can  infer 
groups  and  similarities.  For  example,  a  storage  facility  may  be  any  building  or  enclosed 
structure,  and  a  vehicle  may  be  a  car,  truck,  or  bus.  In  practice,  an  ontology  would  be  developed 
cooperatively  with  the  other  systems  contributing  to  the  multi-INT  data  streams. 

Finally,  Figure  6  shows  an  example  of  a  graph  pattern  specifying  a  potential  threat  related  to  IED 
activity.  In  this  case  we  are  looking  for  persons  with  a  previous  association  with  some  type  of 
IED  activity  who  have  direct  access,  or  are  linked  to  persons  with  access  to,  fertilizer,  a  vehicle, 
and  a  storage  facility.  For  example,  IED  activity  could  include  IED  funding,  manufacture, 
placement,  transport,  or  triggering.  Examples  of  a  vehicle  could  include  cars,  trucks,  and  other 
vehicles.  A  storage  facility  may  include  a  shop,  house,  or  out  building. 


Figure  6.  An  example  of  a  graph  pattern  for  complex  events  related  to  IEDs.  In  this  case  we  are 
searching  for  any  set  of  relationships  involving  a  person  previously  linked  to  IED  activity  and 
persons  with  access  to  fertilizer,  a  vehicle,  and  a  storage  facility. 


The  IPB  reports  which  were  the  source  of  the  colored  nodes  in  the  threat  warning  output  are 
shown  in  Figure  7.  The  full  data  graph  with  similarly  colored  nodes  is  shown  in  Figure  8.  The 
IPB  reports  were  processed  roughly  in  the  order  shown.  It  can  be  seen  that  earlier  information  in 
reports  DIIR  1-05  and  report  DIIR  1-08  was  later  connected  with  information  in  TACREP  4-13. 
Our  system  builds  data  graph  incrementally  and  raises  a  threat  warning  whenever  the  a  match  is 
found  for  the  specified  pattern  graph.  The  pattern  graph  produced  that  is  of  interest  is  depicted  in 
Figure  9. 
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Figure  7.  The  raw  data  for  our  example  is  contained  in  two  sets  of  reports.  The  color  highlights 
show  reports  that  are  linked  to  IED  activity. 


Figure  8.  The  full  RDF  data  IED  related  activity. 


Figure  9.  The  threat  warning  graph  generated  by  our  graph  pattern  match  algorithm. 


5.  Preliminary  performance  evaluation. 

We  tested  out  prototype  against  a  well-developed  library  implementation  of  SPARQL.  Figure  10 
summarizes  the  results.  We  used  synthetic  data  sets  so  that  we  could  vary  the  size  consistently 
and  control  the  complexity  of  the  RDF  sets  [18].  Our  graph  pattern  algorithm  prototype  consisted 
of  a  680  line  implementation  in  Python  using  the  networkx  graph  library  [27]  and  the  rdflib  RDF 
library  [31].  The  SPARQL  query  prototype  was  a  200  line  implementation  in  Python  using  the 
rdflib  RDF  library  and  the  rdflib  SPARQL  library. 

The  results  of  our  test  show  that  the  running  times  of  the  two  implementations  was  significantly 
different  at  a  confidence  level  of  95  percent.  The  difference  in  run  time  for  the  SPARQL 
implementation  for  the  10,000  and  100,000  RDF  triple  tests  was  not  statistically  significant  at  a 
95  percent  confidence  level.  The  results  are  shown  in  Figure  10. 
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Figure  10.  A  comparison  of  graph  pattern  search  and  SPARQL  queries.  Total  execution  time 
for  10  executions  each  of  5  random  pattern  searches  in  synthetic  data  sets. 


Statistics  were  gathered  for  the  six  test  runs  of  the  prototype  graph  pattern  search  implementation 
and  SPARQL  query  standard  algorithm.  The  execution  environment  was  a  Dell  Precision  T1500 
desktop  PC  with  Core  i7  processor,  8GB  of  RAM,  Windows  7  operating  system,  and  a  one  TB 
hard  drive.  Fifty  runs  were  made  for  each  RDF  triple  graph.  From  these  preliminary  tests  we 
conclude  that  our  graph  pattern  search  algorithms  have  a  runtime  performance  that  is  acceptable 
at  this  early  stage  of  investigation.  We  are  currently  engaged  in  implementing  a  more  extensive 
prototype  which  we  can  use  to  test  in  more  realistic  data  environments. 


6.  Summary. 

In  summary  we  have  presented  an  analysis  of  the  generalized  EIW  multi-INT  data  environment 
and  an  approximate  graph  pattern  search  algorithm  for  identifying  complex  events.  We  tested  our 
prototype  algorithms  against  standard  search  algorithms  and  determined  that  the  performance  is 
acceptable.  In  our  view  the  preliminary  performance  of  the  system  is  adequate  to  justify  further 
research  investment.  There  is  a  wide  range  of  potential  EIW  datasets. 

Our  future  work  will  involve  continued  development  of  the  prototype  discussed  in  this  paper. 

The  objective  of  the  next  phase  of  research  will  be  demonstrating  operation  in  the  streaming 
environment  of  a  Marine  Corps  exercise.  We  will  also  seek  to  evaluate  and  better  understand  the 
developing  data  environment  and  adapt  our  algorithms  as  necessary. 
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Problem:  Dynamic  Data 
Constantly  changing  data  from 
many  sources  leads  to 
fragmented,  inefficient  search 
methods. 

Approach:  Characterize  and 
Investigate 

1 .  Observe  actual  composition 
and  structure,  and  define 
nature  of  changes. 

2.  Investigate  areas  of  theory 
that  can  accommodate 
observations. 

3.  Investigate  initial  feasibility. 
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Multi-Source 

Intelligence 

(Multi-INT) 


Processing 


Common  Access 
Storage 


E/R  Extraction 


Sentiment 


Tracking 


Federated 
Storage  in  RDF* 
Format 


*  RDF:  Resource  Description  Format  (W3C  Std.) 
http://www.w3.org/TR/rdf-primer/ 


http://ltsn.onr/DisasterEvent 


The  Event  Continuum 
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Static  Networks 

I 

I 

Dynamic  Networks 

Event  Sequences 

•  Social  Networks 

•  Temporal  persistence 

•  Sequence  dominates 

•  Concurrent 

I 

•  Complex  relations 

•  Low  persistence 

•  Simple  relations 

I 

•  Complex  dependencies 

•  Independent  events 

•  Simple  dependencies 

I 

i . 

•  Evolution  over  time 

•  No  concurrency 

Three  distinct  regimes  in  the  literature.  Complex 
Event  Processing  in  AIW  event  streams  rarely  has 
the  characteristics  of  Static  Networks. 
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Characterization: 

•Variable  temporal  persistence. 
•Independent  events. 

•Complex  dependences. 
•Evolution. 

•Sequential  events. 
•Concurrent  events. 

Investigate: 

•Dynamic  graph  theory. 

Initial  feasibility: 

•Algorithm  prototype. 
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A  Framework  and  Algorithms  to  infer  complex  events 
from  high-volume  streams  of  events  in  near-real  time. 


Technical  Approach 

The  approach  involves  the  following  steps: 

1 )  Event  Streams:  Investigate  structure  and 
phenomenology  of  EIW  events  and  event 
streams.  Augment  streams  by  adding 
context,  time  stamps,  pedigree  and  graph 
structure. 

2)  Data  Framework:  Tag  and  encode  the 
data  using  lexicons,  schemas,  and 
ontologies.  Investigate  dynamic  graph 
theory  towards  scalable  graph  update  and 
search. 

3)  CEP  Algorithms:  Develop  and  implement 
new  dynamic  graph  algorithms  for 
approximate  graph  pattern  search.  Evaluate 
algorithms  to  identify  relevant  patterns, 
activities,  and  events. 

4)  MOP  Evaluation:  establish  means  to 
measure  and  improve  performance. 
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Graph  Pattern  Match 

Input:  Pattern  P=(Vp,Ep),  Data  Graph  G=(V,E),  and  Ontology  0 

Initial  step:  compute  all-pairs-shortest  path  matrix  M. 

A.  Find  descended  nodes,  DVp=desc(Vp),  in  ontology  of  each  pattern 
graph  node. 

B.  Compute  potential  matching  set  in  G  for  each  node  in  DVp. 

C.  Traverse  paths  in  the  matching  set,  examining  path  length  and  edge 
type.  Remove  nodes  that  are  not  connected,  do  not  meet  path 
constraints,  or  do  not  have  correct  edge  type. 


Distance  matrix  is  updated  as  graph  changes  using  algorithms  from  Ramalingam,  1996. 
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US  FOB 

Lat/Long  34.6720  -77.2402 
MGRS  1 8STD9474638954 


NAI  1  (JAFARNI 
VILLAGE) 


IIMEF:  Second  Marine  Expeditionary  Force 
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KLE  IPB  Data  Graph 
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Draft  Intelligence 

TACREPS 

Information 

Reports 

TACREP_1-02.txt 

TACREP_2-03.txt 

DIIR_1-01.txt 

DIIR_1-03.txt 

TACREP_2-04.txt 

TACREP_2-06.txt 

DIIR_1-04.txt 

DIIR_1-05.txt 

TACREP_3-02.txt 

TACREP_4-02.txt 

DIIR_1-06.txt 

DIIR_1-08.txt 

TACREP_4-04.txt 

TACREP_4-06.txt 

DIIR_2-01.txt 

DIIR_2-02.txt 

TACREP_4-09.txt 

TACREP_4-11.txt 

DIIR_2-05.txt 

DIIR_3-01.txt 

TACREP_4-12.txt 

TACREP_4-13.txt 

DIIR_3-03.txt 

DIIR_3-04.txt 

TACREP_4-16.txt 

TACREP_4-18.txt 

DIIR_4-01.txt 

DIIR_4-03.txt 

TACREP_4-19.txt 

TACREP4-21  .txt 

DIIR_4-05.txt 

DIIR_4-10.txt 

DIIR  4-17.txt 

DIIR_4-08.txt 

DIIR_4-14.txt 

Dynamic  Data  Graph  is  built  incrementally,  as  information  becomes  available. 

Threat  patterns  can  be  detected  at  any  time. 


IPB:  Intelligence  Preparation  for  the  Battlefield. 
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Threat  Pattern  Graph 


Watch  for  a  Person  with  a 
previous  involvement  in 
IED  Activity,  and  access 
to  Fertilizer,  a  Vehicle, 
and  a  Storage  Facility. 


Given  some  hierarchy  of 
objects  and  activities: 
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Watch  the  data  as  the  graph  changes  and  warn  of  any  matching  pattern. 


Threat  Warning  Graph 
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Draft  Intelligence 


Information 

DIIR_1-01.txt 

DIIR_1-04.txt 

DIIR_1-06.txt 

DIIR_2-01.txt 

DIIR_2-05.txt 
DIIR_3-03.txt 
DIIR_4-01.txt 
DIIR_4-05.txt 
DIIR_4-10.txt 
DIIR  4-17.txt 


Reports 

DIIR_1-03.txt 

DIIR_1-05.txt 

DIIR_1-08.txt 
DIIR_2-02.txt 
DIIR_3-01.txt 
DIIR_3-04.txt 
DIIR_4-03.txt 
DIIR_4-08.txt 
DIIR  4-14.txt 
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TACREP_1-02.txt 
TACREP_2-04.txt 
TACREP_3-02.txt 
TACREP_4-04.txt 
TACREP_4-09.txt 
TACREP_4-12.txt 
TACREP_4-16.txt 
TACREP  4-19.txt 


TACREP_2-03.txt 
TACREP_2-06.txt 
TACREP_4-02.txt 
TACREP_4-06.txt 
TACREP_4-11.txt 
TACREP_4-13.txt 
TACREP_4-18.txt 
TACREP  4-21.txt 
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Initial  Performance  Comparisons 


SPA  WAR 

V 

Systems  Center 
PACIFIC 


Graph  Pattern  algorithm  prototype: 

•680  line  implementation  in  Python 
•Uses  networkx  graph  library. 

•Uses  rdflib  RDF  library. 


SPARQL  query  prototype: 

•200  line  implementation  in  Python 
•Uses  rdflib  RDF  library. 

•Uses  rdflib  SPARQL  library. 


Execution  environment:  Dell  Precision  T1500,  Core  \1  processor,  8GB  RAM,  Windows  7  OS,  1TB  HD. 


09/18/11 


Results 


SPA  WAR 

V 

Systems  Center 
PACIFIC 


•  Completed  initial  studies  of  EIW  data  streams  and  content. 

•  Completed  initial  prototype  of  graph  pattern  search  algorithms. 

•  Completed  initial  performance  comparisons  with  std.  search 
algorithms  (Naive  implementation  was  no  worse  than  current 
standard  algs,  with  improved  capabilities). 

•  Developed  graph  encoding  framework,  including  activity  and 
event  hierarchy. 

•  Developed  preliminary  methods  of  performance  assessment  for 
activity  discovery. 

•  Implemented  prototype  incremental  graph. 
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Issues 


SPA  WAR 

V 

Systems  Center 
PACIFIC 


Data  issues: 

•Access  to  suitable  data  with  documentation. 
•Relatively  small  data  set  sizes. 

•Time  constraints  for  processing  and  encoding  data. 
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Future  Work 


SPA  WAR 

V 

Systems  Center 
PACIFIC 


•  Mature  graph  encoding  work  for  wider  application  and 
greater  robustness. 

•  Automate  conditioning  of  data  to  facilitate  larger-scale 
testing. 

•  Participate  in  user  experiments,  when/where  possible. 

•  Investigate  additional  application  areas  with  larger  data 
sets. 
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